The following will make sure you have what you need to run the rest of the code.

library(tidyverse)
model_variables = read.csv('data/model_variables_anonymized.csv')

Machine Learning

Use caret to employ machine learning

Start with some pre-processing of the data

Example with XGBoost

Run in parallel


Machine Learning

Python

With machine learning, we finally get to a point where Python is on par with and typically surpasses R.

Most techniques that would fall under the heading of machine learning are first developed in Python.

For at least some techniques, Python will typically run faster, possibly notably so, but this depends on many factors.

LS0tCnRpdGxlOiAnTW9kdWxlIDQ6IE1vcmUgQW5hbHl0aWNzJwpvdXRwdXQ6CiAgaHRtbF9kb2N1bWVudDoKICAgIGRmX3ByaW50OiBwYWdlZAogICAgY3NzOiBvdGhlci5jc3MKICBodG1sX25vdGVib29rOgogICAgY3NzOiBvdGhlci5jc3MKICAgIGhpZ2hsaWdodDogcHlnbWVudHMKICAgIHRoZW1lOiBzYW5kc3RvbmUKZWRpdG9yX29wdGlvbnM6CiAgY2h1bmtfb3V0cHV0X3R5cGU6IGlubGluZQotLS0KCmBgYHtyIGluaXQsIGVjaG89RkFMU0V9CiMgdGhlc2Ugb3B0aW9ucyBhcmUgcHJpbWFyeSB1c2VmdWwgdG8gdGhlIGNyZWF0aW9uIG9mIHRoZSBodG1sIGRvY3VtZW50CmtuaXRyOjpvcHRzX2NodW5rJHNldCgKICBlY2hvPVQsIAogIGV2YWwgPSBGLAogIG1lc3NhZ2UgPSBGLCAKICB3YXJuaW5nID0gRiwgCiAgY29tbWVudCA9IE5BLAogIFIub3B0aW9ucz1saXN0KHdpZHRoPTEyMCksIAogIGNhY2hlLnJlYnVpbGQgPSBGLAogIGNhY2hlID0gRiwgCiAgZmlnLmFsaWduPSdjZW50ZXInLCAKICBmaWcuYXNwID0gLjcsCiAgZGV2ID0gJ3N2ZycsIAogIGRldi5hcmdzPWxpc3QoYmcgPSAndHJhbnNwYXJlbnQnKQopCmBgYAoKVGhlIGZvbGxvd2luZyB3aWxsIG1ha2Ugc3VyZSB5b3UgaGF2ZSB3aGF0IHlvdSBuZWVkIHRvIHJ1biB0aGUgcmVzdCBvZiB0aGUgY29kZS4KCmBgYHtyIGNhdGNodXB9CmxpYnJhcnkodGlkeXZlcnNlKQptb2RlbF92YXJpYWJsZXMgPSByZWFkLmNzdignZGF0YS9tb2RlbF92YXJpYWJsZXNfYW5vbnltaXplZC5jc3YnKQpgYGAKCgojIyBNYWNoaW5lIExlYXJuaW5nCgpVc2UgY2FyZXQgdG8gZW1wbG95IG1hY2hpbmUgbGVhcm5pbmcKClN0YXJ0IHdpdGggc29tZSBwcmUtcHJvY2Vzc2luZyBvZiB0aGUgZGF0YQoKYGBge3IgcHJlcHJvY2Vzc30KbGlicmFyeShjYXJldCkgIyBuZWVkIHRvIGluc3RhbGw/CnNldC5zZWVkKDEyMzQpICMgc28gdGhhdCB0aGUgaW5kaWNlcyB3aWxsIGJlIHRoZSBzYW1lIHdoZW4gcmUtcnVuCnRyYWluSW5kaWNlcyA9IGNyZWF0ZURhdGFQYXJ0aXRpb24obW9kZWxfdmFyaWFibGVzJGxpYnVzZXIsIHA9LjgsIGxpc3Q9RikKClhfdHJhaW4gPSBtb2RlbF92YXJpYWJsZXMgJT4lIAogIHNsaWNlKHRyYWluSW5kaWNlcykKClhfdGVzdCA9IG1vZGVsX3ZhcmlhYmxlcyAlPiUgCiAgc2xpY2UoLXRyYWluSW5kaWNlcykKYGBgCgoKCkV4YW1wbGUgd2l0aCBYR0Jvb3N0CgpgYGB7ciB4Z2Jvb3N0X3NldHVwfQpsaWJyYXJ5KHhnYm9vc3QpICAjIG5lZWQgdG8gaW5zdGFsbD8KCnhnYl9vcHRzID0gZXhwYW5kLmdyaWQoCiAgZXRhID0gYyguMywgLjQpLAogIG1heF9kZXB0aCA9IGMoOSwgMTIpLAogIGNvbHNhbXBsZV9ieXRyZWUgPSBjKC42LCAuOCksCiAgc3Vic2FtcGxlID0gYyguNSwgLjc1LCAxKSwKICBucm91bmRzID0gMTAwLCAjIDEwMDAgd291bGQgYmUgbW9yZSByZWFzb25hYmxlLCBidXQgbm90YWJseSB0aW1lIGNvbnN1bWluZwogIG1pbl9jaGlsZF93ZWlnaHQgPSAxLAogIGdhbW1hID0gMAopCgpjdl9vcHRzID0gdHJhaW5Db250cm9sKG1ldGhvZD0nY3YnLCBudW1iZXI9MTApCmBgYAoKUnVuIGluIHBhcmFsbGVsCgpgYGB7ciB4Z2Jvb3N0fQojIGZvciBwYXJhbGxlbCBwcm9jZXNzaW5nCmxpYnJhcnkoZG9QYXJhbGxlbCkgICMgbmVlZCB0byBpbnN0YWxsPwpjbCA9IG1ha2VDbHVzdGVyKGRldGVjdENvcmVzKCkgLSAxKQpyZWdpc3RlckRvUGFyYWxsZWwoY2wpCgpyZXN1bHRzX3hnYiA9IHRyYWluKAogIGxpYnVzZXIgfiAuLAogIGRhdGEgPSBYX3RyYWluLAogIG1ldGhvZCA9ICd4Z2JUcmVlJywKICBwcmVQcm9jZXNzID0gYygnY2VudGVyJywgJ3NjYWxlJyksCiAgdHJDb250cm9sID0gY3Zfb3B0cywKICB0dW5lR3JpZCA9IHhnYl9vcHRzCikKCnN0b3BDbHVzdGVyKGNsKQoKcmVzdWx0c194Z2IKYGBgCgotLS0KCiMjIE1hY2hpbmUgTGVhcm5pbmcKCgpgYGB7ciB4Z2JfY219CnByZWRzX2diID0gcHJlZGljdChyZXN1bHRzX3hnYiwgWF90ZXN0KQpjb25mdXNpb25NYXRyaXgocHJlZHNfZ2IsIFhfdGVzdCRsaWJ1c2VyLCBwb3NpdGl2ZT0neWVzJykKYGBgCgoKCiMjIFB5dGhvbgoKV2l0aCBtYWNoaW5lIGxlYXJuaW5nLCB3ZSBmaW5hbGx5IGdldCB0byBhIHBvaW50IHdoZXJlIFB5dGhvbiBpcyBvbiBwYXIgd2l0aCBhbmQgdHlwaWNhbGx5IHN1cnBhc3NlcyBSLgoKTW9zdCB0ZWNobmlxdWVzIHRoYXQgd291bGQgZmFsbCB1bmRlciB0aGUgaGVhZGluZyBvZiBgbWFjaGluZSBsZWFybmluZ2AgYXJlIGZpcnN0IGRldmVsb3BlZCBpbiBQeXRob24uCgpGb3IgYXQgbGVhc3Qgc29tZSB0ZWNobmlxdWVzLCBQeXRob24gd2lsbCB0eXBpY2FsbHkgcnVuIGZhc3RlciwgcG9zc2libHkgbm90YWJseSBzbywgYnV0IHRoaXMgZGVwZW5kcyBvbiBtYW55IGZhY3RvcnMuCgojIyMgSW5pdAoKYGBge3B5dGhvbiBweV9pbml0LCBlbmdpbmUucGF0aD0gJy9Vc2Vycy9taWNsL2FuYWNvbmRhMy9iaW4vcHl0aG9uJ30KCiMgbm90ZSBob3cgd2hlbiB1c2luZyBzb21ldGhpbmcgb3RoZXIgdGhhbiBSLCB5b3UgaGF2ZSB0byBzcGVjaWZ5IHRoZSBlbmdpbmUgcGF0aAoKaW1wb3J0IHBhbmRhcyBhcyBwZAppbXBvcnQgbnVtcHkgYXMgbnAKaW1wb3J0IHN0YXRzbW9kZWxzCgoKbW9kZWxfdmFyaWFibGVzID0gcGQucmVhZF9jc3YoJ2RhdGEvbW9kZWxfdmFyaWFibGVzX2Fub255bWl6ZWQuY3N2JykKYGBgCgojIyMgUmFuZG9tIGZvcmVzdAoKYGBge3B5dGhvbiByZjEsIGV2YWw9Rn0KZnJvbSBza2xlYXJuLmVuc2VtYmxlIGltcG9ydCBSYW5kb21Gb3Jlc3RDbGFzc2lmaWVyCgpyZiA9IFJhbmRvbUZvcmVzdENsYXNzaWZpZXIobl9lc3RpbWF0b3JzPTEwMDApICAjIG51bWJlciBvZiB0cmVlcwoKcmZfb3B0cyA9IHsnbWF4X2ZlYXR1cmVzJzogbnAuYXJhbmdlKDIsNyl9ICAjIHR1bmluZyBwYXJhbWV0ZXIKcmZfZXN0aW1hdG9yID0gR3JpZFNlYXJjaENWKHJmLCBjdj0xMCwgcGFyYW1fZ3JpZD1yZl9vcHRzLCBuX2pvYnM9NCkgICMgMTAtZm9sZCBjdgpyZXN1bHRzX3JmID0gcmZfZXN0aW1hdG9yLmZpdChYX3RyYWluLCB5X3RyYWluKSAgIyByZXF1aXJlcyBtYXRyaWNlcwpgYGAKCkluc3BlY3QgdGhlIGJlc3QgcmVzdWx0IG92ZXIgdGhlIHR1bmluZyBwYXJhbWV0ZXJzCgpgYGB7cHl0aG9uIHJmMiwgZXZhbD1GfQpyZXN1bHRzX3JmLmJlc3Rfc2NvcmVfCnJlc3VsdHNfcmYuYmVzdF9wYXJhbXNfCmBgYAoKVGVzdCBtb2RlbCBvbiBuZXcgZGF0YQoKYGBge3B5dGhvbiByZjMsIGV2YWw9Rn0KcmZfcHJlZGljdCA9IHJlc3VsdHNfcmYucHJlZGljdChYX3Rlc3QpCnByaW50KG1ldHJpY3MuY2xhc3NpZmljYXRpb25fcmVwb3J0KHlfdGVzdCwgcmZfcHJlZGljdCkpCmBgYAoKCg==